Standard CRC 32
Here we are again. Today let's talk about an interesting topic: CRC. I'll not deal with all the theory behind it, I'll only present the way of calculating the standard 'CRC 32 bits' that so many progams use (PKZIP, ARJ,...). If you want to learn why and how it works, have a look at an article by Ross Williams that is available at his homepage. What's the use of a CRC? Just to check if a file has been modified. It's used by most compressors and also some anti-virus programs that need it for error detection.
CRC is an abbreviation for Cyclic Redundancy Code. In the case of CRC 32 it's 32 bits long, so it's a double word, also named unsigned long in C. What we need is a table of dwords with 256 entries (the so-called CRC table). Moreover, we have to initialise a CRC value, for instance with 0FFFFFFFFh. Now we repeat the following procedure until no bytes are left in the input stream: xor the input value with the current CRC value, thus in the lower byte getting the position of the table value we have to read, shift the CRC value to the right 8 times, and xor it with the table value. In the end we xor the CRC value with 0FFFFFFFFh and get a 32-bit CRC value we can store along with the compressed data and later use in our decompressor in order to check the authencity of a file by performing the operation again and comparing the stored CRC value with the new one. If they aren't equal, the data is corrupted, possibly due to a media error.
It's as easy as that. A sample CRC table for Assembler is included in bonus.zip (crc32.inc). If you want it in another language, let me know (I even can send it to you in binary form).
Pseudo Code
Here is the pseudo code for computing a CRC value:
- Init CRC_value (0FFFFFFFFh)
- Loop (while there are bytes in input)
- Position = (byte_from_input XOR CRC_value) AND 0FFh
- table_value = CRC32_table [position]
- Shift the CRC value to the right 8 times
- CRC_value = CRC_Value XOR table_value
- Next input byte. Repeat.
- Xor CRC_value with end value. (0FFFFFFFFh)
Mind that we AND the result from the xor with the input byte and CRC value with 0FFh because we only want values between 0-255. Remember that CRC_value and table_value are both dwords.
Speed ups
First you should implement it in Assembler, so for every variable you use registers. Apart from that you should do some loop unrolling and try to improve the pairing. However, this implementation is quite fast because it only needs one lookup read for every byte (with a little luck this data is in the cache) and two xors.